Reminder of data types in Bioconductor

We will also encounter and make use of many data structures and data types which we have seen throughout our courses on HTS data. You can revisit this material to refresh on HTS data analysis in Bioconductor and R below.

Materials. - Presentations, source code and practicals.

Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.

  • viz_course/presentations/Slides/ Presentations as an HTML slide show.
  • viz_course/presentations/exercises/ Some tasks/examples to work through.

Set the Working directory

Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.

You may navigate to the unarchived VisualisingGenomicsData folder in the Rstudio menu

Session -> Set Working Directory -> Choose Directory

or in the console.

setwd("/PathToMyDownload/VisualizingGenomicsData/viz_course/presentations/Slides") 
# e.g. setwd("~/Downloads/VisualizingGenomicsData/viz_course/presentations/Slides") 

Covered so far. (Adding data from GRanges/bigWigs)

We can create DataTrack objects from GRanges objects and plot these alongside our GenomeAxisTrack object using the plotTracks() functions.

library(rtracklayer) 
allChromosomeCoverage <- import.bw("../../Data/activatedReads.bw",
                                   as="GRanges") 
accDT <- DataTrack(allChromosomeCoverage,chomosome="chr17") 
plotTracks(c(accDT,genomeAxis), 
           from=47504051,to=47600688, 
           chromosome="chr17",type="hist") 

SequenceTracks

When displaying genomics data it can be important to illustrate the underlying sequence for the genome being viewed.

Gviz uses SequenceTrack objects to handle displaying sequencing information.

First we need to get some sequence information for our genome of interest to display. Here we will use one of the BSgenome packages specific for hg19 - BSgenome.Hsapiens.UCSC.hg19. This contains the full sequence for hg19 as found in UCSC

library(BSgenome.Hsapiens.UCSC.hg19) 
BSgenome.Hsapiens.UCSC.hg19[["chr7"]] 
##   159138663-letter "DNAString" instance
## seq: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

SequenceTracks - From a DNAstringset object

We can also specify a DNAstringset object which we have encountered in the Bioconductor courses.

dsSet <- DNAStringSet(Hsapiens[["chr7"]]) 
names(dsSet) <- "chr7" 
sTrack <- SequenceTrack(dsSet) 
plotTracks(sTrack,from=134887024,to=134887074, 
           chromosome = "chr7",cex=2.5) 

SequenceTracks - Displaying complement sequence

As with IGV, the sequence can be displayed as its complement. This is performed here by setting the complement argument to the plotTracks() function to TRUE/T.

sTrack <- SequenceTrack("../../Data/chr7Short.fa") 
plotTracks(sTrack,from=1,to=50, 
           chromosome = "chr7",complement=T,cex=3) 

SequenceTracks - Displaying strand information

We can also add 5’ to 3’ direction as we have for plotting GenomeAxisTrack objects using the add53 parameter. This allows for a method to illustrate the strand of the sequence being diplayed.

sTrack <- SequenceTrack("../../Data/chr7Short.fa") 
plotTracks(sTrack,from=1,to=50, 
           chromosome = "chr7",complement=F, 
           add53=T,cex=2.5) 

SequenceTracks - Displaying strand information

Notice the 5’ and 3’ labels have swapped automatically when we have specified the complement sequence.

sTrack <- SequenceTrack("../../Data/chr7Short.fa") 
plotTracks(sTrack,from=1,to=50, 
           chromosome = "chr7",complement=T, 
           add53=T,cex=2.5) 

SequenceTracks - Controlling base display size

We can control the size of bases with the cex parameter, as with the standard R plotting.

An interesting feature of this is that when plotted bases overlap, Gviz will provide a colour representation of bases instead of the bases’ characters.

plotTracks(sTrack,from=1,to=50, 
           chromosome = "chr7",cex=2.5) 

plotTracks(sTrack,from=1,to=50, 
           chromosome = "chr7", 
           cex=5) 

SequenceTracks - Controlling base display size

If we want to colour bases by own colour scheme, we can provide a named vector of our desired colours to the fontcolor parameter.

colForLetters <- c(A = "darkgrey", C = "red",
                    T = "darkgrey",G = "darkgrey")
plotTracks(sTrack,from=1,to=50, 
           chromosome = "chr7",cex=3,
           fontcolor=colForLetters) 

AlignmentsTrack. Plotting Aligned Reads in Gviz

The AlignmentsTrack object can be plotted in the same manner as tracks using plotTracks() function.

Since the BAM file may contain information from all chromosomes we need to specify a chromsome to plot in the chromosome parameter and here we specify the from and to parameters too.

   plotTracks(peakReads, 
              chromosome="chr5", 
              from=135312577, 
              to=135314146) 

AlignmentsTrack. Plotting Aligned Reads in Gviz

   plotTracks(peakReads, 
              chromosome="chr5", 
              from=135312577, 
              to=135314146) 

By default AlignmentTracks are rendered as both the reads themselves and the calculated coverage/signal depth from these reads.

Reads, as with AnnotationTrack objects, show the strand of the aligned read by the direction of the arrow.

AlignmentsTrack. Plotting Aligned Reads in Gviz

The type of plot/plots produced can be controlled by the type argument as we have done for DataTrack objects.

The valid types of plots for AlignmentsTrack objects are “pileup”, “coverage” and “sashimi” (We’ve come across sashimi plots before).

The type “pileup” displays just the reads.

   plotTracks(peakReads, 
              chromosome="chr5", 
              from=135312577, 
              to=135314146, 
              type="pileup") 

AlignmentsTrack. Plotting Aligned Reads in Gviz

The type “coverage” displays just the coverage (depth of signal over genomic positions) calculated from the genomic alignments.

   plotTracks(peakReads, 
              chromosome="chr5", 
              from=135312577, 
              to=135314146, 
              type="coverage") 

AlignmentsTrack. Plotting Aligned Reads in Gviz

As we have seen the default display is a combination of “pileup” and “coverage”.

We can provide multiple type arguments to the plotTracks() function as a vector of valid types. The order in vector here does not affect the display order in panels.

   plotTracks(peakReads, 
              chromosome="chr5", 
              from=135312577, 
              to=135314146, 
              type=c("pileup","coverage")) 

AlignmentsTrack. Sashimi plots in Gviz

To recapitulate this plot, we have retrieved the subsection of BodyMap data as BAM files.

First we must create two AlignmentsTrack objects, one for each tissue’s BAM file of aligned reads.

In this case since we are working with paired-end reads we must specify this by setting the isPaired parameter to TRUE

heartReads <- AlignmentsTrack("../../Data/heart.bodyMap.bam", 
                           isPaired = TRUE) 
liverReads <- AlignmentsTrack("../../Data/liver.bodyMap.bam",  
                           isPaired = TRUE) 
 
liverReads 
## ReferenceAlignmentsTrack 'AlignmentsTrack'
## | genome: NA
## | active chromosome: chrNA
## | referenced file: ../../Data/liver.bodyMap.bam
## | mapping: id=id, cigar=cigar, mapq=mapq, flag=flag, isize=isize, groupid=groupid, status=status, md=md, seq=seq

AlignmentsTrack. Sashimi plots in Gviz

To reproduce a plot similar to that in IGV we can simply include the “sashimi” type in the type parameter vector, here alongside “coverage”

plotTracks(c(heartReads,liverReads), 
           chromosome="chr12", 
           from=98986825, 
           to=98997877, 
           type=c("coverage","sashimi")) 

AlignmentsTrack. Highlighting genomic alignment information.

The AlignmentTrack object allows for specific parameters controlling how reads are displayed to be passed to the plotTracks() function.

A few useful parameters are col.gaps and col.mates or lty.gap and lty.mates which will allow us to better distinguish between gapped alignments (split reads) and gaps between read pairs respectively.

plotTracks(c(liverReads), 
           chromosome="chr12", 
           from=98986825,to=98997877, 
           col.gap="Red",col.mate="Blue") 

AlignmentsTrack. Highlighting genomic alignment information.

Similarly using lty.gap and lty.mate parameters.

plotTracks(c(liverReads), 
           chromosome="chr12", 
           from=98986825,to=98997877, 
           lty.gap=2,lty.mate=1) 

Line width may also be controlled with lwd.gap and lwd.mate parameters continuing the similarities to Base R plotting.

AlignmentsTrack. Highlighting mismatches to reference.

A common purpose in visualising alignment data in broswers is review information relating to mismatches to the genome which may be related to SNPs.

In order to highlight mismatches to the genome reference sequence we must first provide some information on the reference sequence.

One method for this is to attach sequence information to the AlignmentsTrack itself by providing a SequenceTrack object to referenceSequence parameter in the AlignmentsTrack() constructor. Here we can create the SequenceTrack object ffor mouse.

library(BSgenome.Mmusculus.UCSC.mm10)
sTrack <- SequenceTrack(BSgenome.Mmusculus.UCSC.mm10) 
activatedReads <- AlignmentsTrack("../../Data/activatedSNPread.bam", 
                           isPaired = TRUE, 
                           referenceSequence=sTrack) 

AlignmentsTrack. Highlighting mismatches to reference.

We could also specify the SequenceTrack in the plotTracks() function as shown for the liver reads example here. Here we simply include the relevant SequenceTrack object as a track to be plotted alongside the BAM.

plotTracks(c(activatedReads,sTrack), 
           chromosome="chr6", 
           from=124815373,to=124815412,cex=2) 

Bringing in External data. Gene models through Biomart

The biomaRt Bioconductor package to programatically query various Biomarts.

Gviz allows us to both query Biomarts and automatically create a GeneRegionTrack using the BiomartGeneRegionTrack objects and BiomartGeneRegionTrack() constructor.

Bringing in External data. Gene models through Biomart

Here we construct a simple BiomartGeneRegionTrack object using the parameters to define locations of interest - “chromsome”, “start”,“end”,“genome” as well as the Biomart to use, in this case Ensembl by setting the name parameter.

bgrTrack <- BiomartGeneRegionTrack(genome="hg19", 
                                   start=26591341, 
                                   end=27034958, 
                                   chromosome = "chr7", 
                                   name="ENSEMBL") 

Bringing in External data. Gene models through Biomart

We can also specify filters in the BiomartGeneRegionTrack() constructor using the filter parameter.

Gviz uses the BiomaRt Bioconductor package to query the Biomarts so we can review availble options using BiomaRt’s useMart() function to list available data collections and listDatasets() function to review all available datasets within the ENSEMBL_MART_ENSEMBL mart.

library(biomaRt)
martList <- listMarts()
mart = useMart("ENSEMBL_MART_ENSEMBL")
dataList <- listDatasets(mart)
mart = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl") 
filterList <- listFilters(mart)

Bringing in External data. Gene models through Biomart

Once we have retrieved our filtered gene models we can plot them as before.

plotTracks(bgrTrack) 

offset

Bringing in External data. Tracks from UCSC

To understand which tables are available we can query the rtracklayer package to identify track and table names.

We can use the broswerSession() function to connect to UCSC and the genome() function to establish the genome of interest.

We can then review available tracks using the trackNames() function.

library(rtracklayer) 
session <- browserSession() 
genome(session) <- "hg19" 
trackNames(session) 
##               Base Position              Alt Haplotypes 
##                     "ruler"              "altLocations" 
##                    Assembly               BAC End Pairs 
##                      "gold"               "bacEndPairs" 
##                   BU ORChID             Chromosome Band 
##          "wgEncodeBuOrchid"                  "cytoBand" 
##               deCODE Recomb                ENCODE Pilot 
##                "decodeRmap"             "encodeRegions" 
##                 FISH Clones            Fosmid End Pairs 
##                "fishClones"               "fosEndPairs" 
##                         Gap                  GC Percent 
##                       "gap"                   "gc5Base" 
##                GRC Incident             GRC Map Contigs 
##             "grcIncidentDb"                   "ctgPos2" 
##           GRC Patch Release                   Hg18 Diff 
##         "altSeqComposite10"            "hg19ContigDiff" 
##                   Hg38 Diff                Hi Seq Depth 
##            "hg38ContigDiff"                "hiSeqDepth" 
##                       INSDC                 LRG Regions 
##               "ucscToINSDC"                       "lrg" 
##                 Map Contigs                 Mappability 
##                    "ctgPos"        "wgEncodeMapability" 
##                 Recomb Rate                  RefSeq Acc 
##                "recombRate"              "ucscToRefSeq" 
##               Restr Enzymes                 Short Match 
##                   "cutters"                "oligoMatch" 
##                 STS Markers                  Wiki Track 
##                    "stsMap"                 "wikiTrack" 
##                  UCSC Genes                 NCBI RefSeq 
##                 "knownGene"           "refSeqComposite" 
##               AceView Genes                    AUGUSTUS 
##                   "acembly"              "augustusGene" 
##                        CCDS                   CRISPR... 
##                  "ccdsGene"                    "crispr" 
##               Ensembl Genes                     EvoFold 
##                   "ensGene"                   "evofold" 
##                    Exoniphy                  GENCODE... 
##                  "exoniphy"      "wgEncodeGencodeSuper" 
##                Geneid Genes               Genscan Genes 
##                    "geneid"                   "genscan" 
##                   H-Inv 7.0           IKMC Genes Mapped 
##           "hinv70Composite"                    "hgIkmc" 
##                 lincRNAs...             LRG Transcripts 
##                  "lincRNAs"          "lrgTranscriptAli" 
##                   MGC Genes                      N-SCAN 
##               "mgcFullMrna"                 "nscanGene" 
##              Old UCSC Genes              ORFeome Clones 
##             "knownGeneOld6"               "orfeomeMrna" 
##                Other RefSeq           Pfam in UCSC Gene 
##               "xenoRefGene"              "ucscGenePfam" 
##            Retroposed Genes                   SGP Genes 
##             "ucscRetroAli5"                   "sgpGene" 
##                   SIB Genes                   sno/miRNA 
##                   "sibGene"                     "wgRna" 
##                 TransMap...                  tRNA Genes 
##                "transMapV4"                     "tRNAs" 
##             UCSC Alt Events                     UniProt 
##                  "knownAlt"                   "uniprot" 
##                  Vega Genes               Yale Pseudo60 
##         "vegaGeneComposite"              "pseudoYale60" 
##                Publications                ClinGen CNVs 
##                      "pubs"             "iscaComposite" 
##            ClinVar Variants                Coriell CNVs 
##                   "clinvar"             "coriellDelDup" 
##              COSMIC Regions               DECIPHER CNVs 
##             "cosmicRegions"                  "decipher" 
##           Development Delay                    GAD View 
##               "cnvDevDelay"                       "gad" 
##           Gene Interactions                 GeneReviews 
##              "interactions"               "geneReviews" 
##                GWAS Catalog               HGMD Variants 
##               "gwasCatalog"                      "hgmd" 
##                Lens Patents               LOVD Variants 
##                    "patSeq"                      "lovd" 
##               MGI Mouse QTL                OMIM Alleles 
##              "jaxQtlMapped"                 "omimAvSnp" 
##                  OMIM Genes             OMIM Pheno Loci 
##                 "omimGene2"              "omimLocation" 
##               RGD Human QTL                 RGD Rat QTL 
##                    "rgdQtl"                 "rgdRatQtl" 
##                     SNPedia            UniProt Variants 
##                   "snpedia"                     "spMut" 
##               Web Sequences                   CGAP SAGE 
##              "pubsBingBlat"                  "cgapSage" 
##                 Gene Bounds                       H-Inv 
##                "rnaCluster"              "HInvGeneMrna" 
##                  Human ESTs                 Human mRNAs 
##                       "est"                      "mrna" 
##           Human RNA Editing                  Other ESTs 
##                    "darned"                   "xenoEst" 
##                 Other mRNAs                     Poly(A) 
##                  "xenoMrna"                     "polyA" 
##                   PolyA-Seq            SIB Alt-Splicing 
##             "polyASeqSites"                "sibTxGraph" 
##                Spliced ESTs                     UniGene 
##                 "intronEst"                 "uniGene_3" 
##                   GTEx Gene             GTEx Transcript 
##                  "gtexGene"            "gtexTranscExpr" 
##             Affy Exon Array                  Affy GNF1H 
##             "affyExonArray"                 "affyGnf1h" 
##                Affy RNA Loc                    Affy U95 
##       "wgEncodeAffyRnaChip"                   "affyU95" 
##                   Affy U133              Affy U133Plus2 
##                  "affyU133"             "affyU133Plus2" 
##                 Allen Brain               Burge RNA-seq 
##             "allenBrainAli" "burgeRnaSeqGemMapperAlign" 
##          CSHL Small RNA-seq           ENC Exon Array... 
##   "wgEncodeCshlShortRnaSeq"    "wgEncodeExonArraySuper" 
##             ENC ProtGeno...              ENC RNA-seq... 
##     "wgEncodeProtGenoSuper"       "wgEncodeRnaSeqSuper" 
##                 GIS RNA PET                 GNF Atlas 2 
##         "wgEncodeGisRnaPet"                 "gnfAtlas2" 
##           GWIPS-viz Riboseq               Illumina WG-6 
##           "gwipsvizRiboseq"            "illuminaProbes" 
##                PeptideAtlas                qPCR Primers 
##          "peptideAtlas2014"               "qPcrPrimers" 
##              RIKEN CAGE Loc                Sestan Brain 
##         "wgEncodeRikenCage"          "sestanBrainAtlas" 
##        ENCODE Regulation...          GTEx Combined eQTL 
##               "wgEncodeReg"           "gtexEqtlCluster" 
##            GTEx Tissue eQTL                 CD34 DnaseI 
##            "gtexEqtlTissue"                "eioJcviNAS" 
##              CpG Islands...            ENC Chromatin... 
##            "cpgIslandSuper"        "wgEncodeChromSuper" 
##           ENC DNA Methyl...          ENC DNase/FAIRE... 
##    "wgEncodeDnaMethylSuper"        "wgEncodeDNAseSuper" 
##              ENC Histone...          ENC RNA Binding... 
##      "wgEncodeHistoneSuper"          "wgEncodeRbpSuper" 
##           ENC TF Binding...              FSU Repli-chip 
##    "wgEncodeTfBindingSuper"      "wgEncodeFsuRepliChip" 
##             Genome Segments           NKI Nuc Lamina... 
##   "wgEncodeAwgSegmentation"              "laminB1Super" 
##                    ORegAnno            Stanf Nucleosome 
##                  "oreganno"         "wgEncodeSydhNsome" 
##             SUNY SwitchGear              SwitchGear TSS 
##    "wgEncodeSunySwitchgear"               "switchDbTss" 
##              TFBS Conserved              TS miRNA sites 
##             "tfbsConsSites"               "targetScanS" 
##           UCSF Brain Methyl             UMMS Brain Hist 
##           "ucsfBrainMethyl"         "uMassBrainHistone" 
##                UW Repli-seq             Vista Enhancers 
##        "wgEncodeUwRepliSeq"            "vistaEnhancers" 
##                Conservation                 Cons 46-Way 
##                "cons100way"                 "cons46way" 
##            Cons Indels MmCf                     Evo Cpg 
##      "consIndelsHgMmCanFam"                    "evoCpg" 
##                        GERP              phastBias gBGC 
##              "allHg19RS_BW"                 "phastBias" 
##           Primate Chain/Net         Placental Chain/Net 
##           "primateChainNet"         "placentalChainNet" 
##        Vertebrate Chain/Net                 5% Lowest S 
##        "vertebrateChainNet"                "ntSssTop5p" 
##            H-C Coding Diffs           Neandertal Methyl 
##      "ntHumChimpCodingDiff"     "neandertalMethylation" 
##              Neandertal Seq                      S SNPs 
##                "ntSeqReads"                 "ntSssSnps" 
##            Sel Swp Scan (S)             Denisova Methyl 
##          "ntSssZScorePMVar"       "denisovaMethylation" 
##                Denisova Seq           Denisova Variants 
##            "dhcBamDenisova"       "dhcVcfDenisovaPinky" 
##            Mod Hum Variants              Modern Derived 
##              "dhcVcfModern"           "dhcHumDerDenAnc" 
##            Common SNPs(150)            Common SNPs(147) 
##              "snp150Common"              "snp147Common" 
##            Common SNPs(146)            Common SNPs(144) 
##              "snp146Common"              "snp144Common" 
##            Common SNPs(142)            Common SNPs(141) 
##              "snp142Common"              "snp141Common" 
##               All SNPs(150)               All SNPs(147) 
##                    "snp150"                    "snp147" 
##               All SNPs(146)               All SNPs(144) 
##                    "snp146"                    "snp144" 
##               All SNPs(142)               All SNPs(141) 
##                    "snp142"                    "snp141" 
##           Flagged SNPs(150)           Flagged SNPs(147) 
##             "snp150Flagged"             "snp147Flagged" 
##           Flagged SNPs(146)           Flagged SNPs(144) 
##             "snp146Flagged"             "snp144Flagged" 
##           Flagged SNPs(142)           Flagged SNPs(141) 
##             "snp142Flagged"             "snp141Flagged" 
##             Mult. SNPs(150)             Mult. SNPs(147) 
##                "snp150Mult"                "snp147Mult" 
##             Mult. SNPs(146)             Mult. SNPs(144) 
##                "snp146Mult"                "snp144Mult" 
##             Mult. SNPs(142)             Mult. SNPs(138) 
##                "snp142Mult"                "snp138Mult" 
##               All SNPs(138)            Common SNPs(138) 
##                    "snp138"              "snp138Common" 
##           Flagged SNPs(138)            1000G Ph1 Accsbl 
##             "snp138Flagged"    "tgpPhase1Accessibility" 
##              1000G Ph1 Vars            1000G Ph3 Accsbl 
##                 "tgpPhase1"    "tgpPhase3Accessibility" 
##              1000G Ph3 Vars              DGV Struct Var 
##                 "tgpPhase3"                   "dgvPlus" 
##                EVS Variants                        ExAC 
##                "evsEsp6500"                      "exac" 
##             Genome Variants                 GIS DNA PET 
##                     "pgSnp"         "wgEncodeGisDnaPet" 
##               HAIB Genotype                 HapMap SNPs 
##      "wgEncodeHaibGenotype"                "hapmapSnps" 
##            HGDP Allele Freq              SNP/CNV Arrays 
##                   "hgdpGeo"            "genotypeArrays" 
##                RepeatMasker            Interrupted Rpts 
##                      "rmsk"             "nestedRepeats" 
##              Microsatellite              NumtS Sequence 
##                  "microsat"                   "numtSeq" 
##              Segmental Dups                  Self Chain 
##          "genomicSuperDups"                 "chainSelf" 
##              Simple Repeats 
##              "simpleRepeat"

Bringing in External data. Tracks from UCSC

ucscTrack <- UcscTrack(genome = "hg19", 
                       chromosome = "chr7", 
                       track = "ensGene", 
                       from = 26591341, 
                       to = 27034958, 
                       trackType = "GeneRegionTrack", 
                       rstarts = "exonStarts", 
                       rends = "exonEnds", 
                       gene ="name", 
                       symbol = "name2", 
                       transcript = "name", 
                       strand = "strand" 
) 

To build the UCSC annotation as a GeneRegionTrack we must specify some information specific to GeneRegionTrack objects. This includes the “rstarts” and “rends”. You can consult the help for GeneRegionTrack() (?GeneRegionTrack to see from in R) to see full parameters required for UcscTrack objects.

Bringing in External data. Tracks from UCSC

Now we can compare the Ensembl gene builds from the two different sources.

Notable differences in the annotation include the absense of some transcipts due to the additional filter applied in our BiomartGeneRegionTrack object creation.

plotTracks(c(bgrTrack,ucscTrack), 
           from = 26591341,to = 27034958) 

Bringing in External data. Tracks from UCSC as DataTrack

conservationTrack <- UcscTrack(genome = "hg19",
                               chromosome = "chr5",
                               track = "Conservation",
                               table = "phyloP100wayAll",
                               from = 135313003,to = 135313570,
                               trackType = "DataTrack",
                               start = "start", end = "end", 
                               data = "score",type = "hist", window = "auto",
                               col.histogram = "darkblue",fill.histogram = "darkblue",
                               ylim = c(-3.7, 4),name = "Conservation") 

Exercises

Time for exercises! Link here


Solutions

Time for solutions! Link here